-
Notifications
You must be signed in to change notification settings - Fork 358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable logging in tests and fix error log flooding from mock chain runtime #1017
Conversation
The error log flooding has now been fixed. There are 2 causes for this:
The fix is as follows:
|
bd5f5d9
to
d5ea005
Compare
@@ -209,6 +209,7 @@ impl From<IdentifiedAnyClientState> for IdentifiedClientState { | |||
#[cfg(test)] | |||
mod tests { | |||
use std::convert::TryFrom; | |||
use test_env_log::test; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we should keep this around. Do you think it would be valuable to have logging enabled in tests going forward?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think they help especially when the tests are failing. Not having them enabled would only make it more inconvenient for contributors to write new tests and fix failing tests. The logging setup for Rust is not very straightforward, and just figuring out how to enable that can waste minutes to hour of a contributor's time.
Err(e) => error!("received error via event bus: {}", e), | ||
Err(e) => { | ||
error!("received error via event bus: {}", e); | ||
return Err(Kind::Channel.into()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I think it is by design that this match arm does not return Err(..)
here but instead permits the loop to continue. For context, please see #895 (and follow-up #903). The idea is that the chain runtime would continue polling via select!
here even if the monitor's connection to the full node is down (or some other error appears in the event_receiver
), and it's the job of the monitor to retry and eventually heal if the connection to the node is re-established.
It would be good if you can verify that your changes do not introduce a regression. Could you please just double-check that? The recipe for testing is in the description of #903 (see "Tested with").
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I was trying earlier to test but was not sure how to exercise the Err
path here. Any tips?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked the source code of crossbeam, and the only possible error is when the sender end of the channel is dropped, and so it is unrecoverable. We probably never encounter this in production, because if the channel is disconnected, it will also result in the flooding of error messages in the relayer log.
Also, if you check the other branches, each of them are called with ?
at the end. This means any error happening in those places will break the loop and return error. We should have another issue to address this by having some way to recover from any error inside the loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the findings & explanation!
I agree with you that the loop is not fault-tolerant anyway. I originally thought that returning in the Err
path here could make the runtime (and the Hermes process in general) exit early (instead of waiting on the monitor to retry). But it seems there is a regression in master (#1026) and that has nothing to do with the changes in this branch.
We should have another issue to address this by having some way to recover from any error inside the loop.
I agree. If you know how to capture this in an issue, could you please handle that? We didn't notice these problems (yet), but it would be good to track this issue and add a general method for handling recovery.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, filed #1025 for that.
Closes: #1018
Description
This PR should cause a large number of error logs to be printed when running cargo tests on
foreign_client::tests
.For contributor use:
docs/
) and code comments.Files changed
in the Github PR explorer.